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Abstract —A current trend in networking and cloud computing is to 
provide compute resources at widely distributed sites; this is exemplified 
by developments such as Network Function Virtualisation. This paves 
the way for wide-area service deployments with improved service quality: 
e.g. user-perceived response times can be reduced by offering services 
at nearby sites. But always assigning users to the nearest site can be a 
bad decision if this site is already highly utilised. This paper formalises 
two related decisions of allocating compute resources at different sites 
and assigning users to them with the goal of minimising the response 
times while the total number of resources to be allocated is limited - a 
non-linear capacitated Facility Location Problem with integrated queuing 
systems. To efficiently handle its non-linearity, we introduce five linear 
problem linearisations and adapt the currently best heuristic for a similar 
scenario to our scenario. All six approaches are compared in experiments 
for solution quality and solving time. Surprisingly, our best optimisation 
formulation outperforms the heuristic in both time and quality. Additionally, 
we evaluate the influence of distributions of available compute resources 
in the network on the response time: The time was halved for some 
configurations. The presented formulation techniques for our problem 
linearisations are applicable to a broader optimisation domain. 

Index Terms —cloud computing; next generation networking, virtual 
network function; network function virtualisation; resource management; 
placement; facility location; queueing model; piecewise linear function; 
problem linearisation; optimisation; mixed integer function; derivative; 
erlang delay formula; 

1 Introduction 

1.1 Opportunities and Problem Statement 

Providing resources at widely distributed sites is a new 
trend in networking and cloud computing. It is known 
under different labels, for example. Carrier Clouds [3], [9], 
[47], Distributed Cloud Computing [2], [17], [41], or In- 
Network Clouds [24], [45], [46]. These In-Network Clouds 
are often geared towards specific network services, e.g., 
firewalls or load balancers - a development popularised 
as Network Function Virtualisation [18]. Such deployments 
have an important advantage: Their compute resources are 
closer to end users than those of conventional clouds, have 
hence smaller latency between user and resource, and are 
therefore suitable for highly interactive services. Examples 
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for such services are streaming applications [4], [49], user- 
customised streaming [5], [27], or cloud gaming [34]. For 
such applications, the crucial quality metric is the user- 
perceived response time to a request. Large response times 
impede usability, increase user frustration [11], or prevent 
commercial success. 

As detailed in a prior publication [30], these response 
times comprise three parts: A request's round trip time (RTT) 
in the network, the actual processing time (PT) of a request, 
and its queuing delay (QD) when it has to wait for free 
compute resources (Figure 1). A simple attempt to provide 
small response times would be to allocate some resources 
at many sites so that each user can access the services at a 
nearby resource 1 . This, however, is typically infeasible as 
each used resource incurs additional costs, e.g., consumes 
energy or has leasing fees. We hence have to decide a) where 
user requests shall be processed - the assignment decision 
- and b) how many resources shall process the requests at 
each site - the allocation decision. Both decisions are mutually 
dependent as exemplified in the next section; the resulting 
problem is called the user assignment and server allocation 
problem. 

Assignments and allocations influ¬ 
ence the queuing delay in two ways: 

Allocation modifies the number of allo¬ 
cated resources y at a site; assignment 
changes the utilisation of allocated re¬ 
sources at a site (for a fixed y). 

Once the number of allocated re¬ 
sources at a site is insufficient to avoid 
queuing delays for the load assigned 
to this site, queues will build up and 
the service quality perceived by the 
user will suffer. This danger is especially high for compute¬ 
intensive services, where even small load can lead to con¬ 
siderable queuing delays. Only little research (Section 2) on 

1. Our paper is part of broader research where services are deployed 
in Virtual Machines (VM) on geographically distributed sites, each man¬ 
aged by an IaaS-software like OpenStack. The work presented by this 
paper also applies to scenarios where a physical host exclusively runs a 
service. To emphasis this generality, we abstractly talk about "allocating 
compute resources" instead of "launching virtual machines", without 
bias for either implementation approach. To simplify explanation, we 
assume allocating homogeneous resources at one side e.g. a slice of 
2 CPUs, 4 GB RAM. Heterogeneous resources can be modelled via 
simple model transformation (Section 3.2). 
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server allocation has yet taken these queuing delays into 
account. 

This paper investigates deploying computation-intensive 
services at compute resources distributed throughout the net. 
Which distributed compute resource is appropriate depends 
on a user request's source site (and their request rate). The 
service deployment objective is to minimise the response 
time while using exactly p compute resources. 2 

When ignoring the queuing delay, assigning user requests 
to resources allocated nearby reduces the round trip time. 
This problem is hard when exactly p resources are available. 
When ignoring the response time, on the other hand, the 
queuing delay is minimised when all requests are assigned to 
a single site to which all p resources are allocated [8]. Hence, 
the queuing delay and round trip time are minimised by two 
conflicting allocation schemes. To find an optimal solution 
for the sum of queueing delay and round trip time, the two 
allocation schemes have to be balanced to find assignments 
and allocations that minimise the response time. 

Similarly, the assignment should prefer faster resources, 
even if their round trip time is a bit longer: Using faster 
resources reduces the processing time. This time reduction 
can compensate for the longer round trip time, in total 
reducing the response time. 

1.2 Queuing Delay Effects 

Consider graph G below Table 1 as an example: Two 
clients c a ,Cb are connected to three facilities /1..3 having 
10 resources available, fci..3=10, such as very large data 
centres. All resources within one facility are homogeneous 
but they differ among the facilities, having service rates 
f<i..3=60; 120; 60 re q/s for facility 1; 2; 3, respectively. The 
assignable requests are limited to 98% of the service rate 
to avoid very high queuing delays due to its asymptotic 
behaviour. This way, the queuing systems are also in 
steady state. The number of resources to use are limited 
to p= 5. Round trip times between clients and facilities 
are ( o ,i..3=4,3..i= 40; 70; 100 re< r/s. Client c a 's arrival rate is 
slightly larger than cfs arrival rate; this slight distortion 
leads to unique solutions. Each row in Table 1 corresponds 
to one level of arrival rates. In the columns, two solutions 
are compared which are obtained by either ignoring (P) 
or considering (QP) the queuing delays when deciding 
allocation and assignment. To compare them directly, for 
both solutions the following measurements were listed: The 
average response times 0RT computed with queuing delays 
in both cases and the times in queuing systems at each 
facility. When comparing the average response times of 
both solutions in the middle of the table; stars mark the 
superior solution. All QP's solutions are at least equal to 
opt P's solutions; most are superior. 

This superiority has two causes: When allocating re¬ 
sources, problem P only minimises the round trip time 
and, thus, assign as many requests as possible nearby. 
Consequently, if all resources are used at the nearest facility, 
additional resources are allocated somewhere. In contrast, 
in our example QP allocates additional resources where 
the queuing delay is reduced the most; cf. ti..3 for A a <60. 

2. We could also limit to at most p resources, but since more resources 
always improve service quality, we use exactly p resources. 


Similarly, the small resource capacity /t | forces P to assign 
more assignments to f 2 when increasing A a >200, pi<p 2 - In 
contrast, Q P assignments are shifted at lower arrival rate to 
f- 2 as the queuing time reduction compensates for the higher round 
trip time. Even with the same allocations 3/1. 3 in A o =120, 
QP balances the assignments better than P which results in 
shorter queueing delays £1.3. 

Table 1 supports another observation: The response time 
differences between P and Q P are small at low or very high 
system utilisation; system utilisation is the fraction of the 
total arrival rate to the total capacity, £/ x f/j2 f k fut- A high 
system utilisation enforces one solution where all resources 
are fully utilised (1). At low system utilisation, the gain of 
additionally considering the queuing delay is small, A a <100. 
However, the tide is turning for medium utilisation, where 
QP's response times can be significantly smaller than P's 
response time, e.g., for A o =180 the 0RTqp is 26.41% of 
0RT P . 

0.98ppi=294 < 550 < 588=0.98 pp 2 (1) 

S -V-" S -V-" 

cap/i A a +Ab cap/ 2 

Assignment and allocation are mutually dependent; 
making a simple separation into subproblems and solving 
them sequential is suboptimal. When assigning demand first, 
one facility / could exist whose amount of assigned requests 
requires allocating more resources as globally allowed (limit 
p); as a simple example imagine 10 users are assigned to 
10 nearby facilities but only p =8 resources are allowed to 
be used. They only way to fix this problem is to adjust the 
assignment to the need of the allocation. Allocating resources 
first without assigning requests beforehand is always feasible. 
But without knowing and assigning the user requests at the 
same time, the round trip times between user and facilities 
with resources are most likely longer than otherwise. Either 
way, the solutions are better when assignment and allocation 
are obtained together. This in turn significantly increases the 
complexity of solving the problem to optimality. 

This paper casts this problem as a queue-extended p- 
median Facility Location Problem (Section 3). To efficiently 
solve this non-linear optimization problem, five problem 
linearisations present different modelling approaches, each 
with different trade-offs between accuracy and search space 
size (Section 4). Additionally, a well-known heuristic for 
a similar problem is adjusted to our problem (Section 5). 
Finally, three aspects are evaluated: First, different basepoint 
sets for the function linearisations are discussed to reduce 
the linearisation error and the number of basepoints - 
which are two conflicting goals (Section 6.1). Second, the 
five linearisations and the heuristic are compared for two 
metrics, accuracy and solving time (Section 6.2). Third, an 
additional evaluation discusses how resource distribution in 
the network reduces response times (Section 6.3). For the last 
two evaluations, 111,600 problems are solved. 

In previous work [30], we had discussed aissgning re¬ 
quests to sites which provided a single compute resource (e.g. 
a single, fast server). The formulated problem was a convex, 
capacitated queue-extended p-median Facility Location Prob¬ 
lem. The present paper extends this work by considering the 
compute resources at each site as individual resources - now, 
we additionally decide the resource allocation at each site 
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Table 1: Compares response times considering (QP) and ignoring (P) QDs for different arrival rates. 


arrival rate [ rec i-/s] 

# resources 

time in system [ms] 

0RTqp 

0 RTp # resources 

time in system [ms] 

A a A b 

h h h 

h h h 

[ms] 

[ms] fi f 2 h 

/i h h 


20 

10 

3 

0 

2 

11.1 

0.0 

5.6 

56.7 

* 

62.2 

1 

1 

3 

16.7 

0.0 

5.6 

40 

30 

3 

0 

2 

9.7 

0.0 

7.6 

57.3 

* 

58.3 

2 

1 

2 

10.7 

0.0 

7.6 

60 

50 

3 

0 

2 

9.5 

0.0 

9.2 

58.7 

* 

61.3 

2 

1 

2 

12.1 

0.0 

9.2 

80 

70 

3 

0 

2 

9.9 

0.0 

11.8 

61.6 

* 

64.3 

2 

0 

3 

16.0 

0.0 

8.3 

too 

90 

3 

0 

2 

10.7 

0.0 

18.0 

68.8 


68.8 

3 

0 

2 

10.7 

0.0 

18.0 

120 

110 

3 

0 

2 

15.8 

0.0 

22.0 

79.3 

* 

102.5 

3 

0 

2 

12.6 

0.0 

49.9 

140 

130 

5 

0 

0 

42.1 

0.0 

0.0 

96.5 

* 

248.9 

3 

0 

2 

24.2 

0.0 

183.3 

160 

150 

0 

2 

3 

0.0 

7.9 

18.4 

97.6 

* 

379.1 

2 

1 

2 

159.7 

5.3 

159.7 

180 

170 

3 

2 

0 

18.9 

14.5 

0.0 

107.2 

* 

405.9 

3 

1 

1 

143.1 

63.1 

140.0 

200 

190 

0 

3 

2 

0.0 

13.4 

13.0 

111.3 

* 

223.3 

3 

2 

0 

128.5 

22.0 

0.0 

220 

210 

0 

4 

1 

0.0 

12.6 

10.6 

116.3 

* 

216.4 

2 

3 

0 

115.1 

17.7 

0.0 

240 

230 

0 

5 

0 

0.0 

12.4 

0.0 

112.4 

* 

291.3 

2 

3 

0 

105.3 

101.0 

0.0 

260 

250 

0 

5 

0 

0.0 

15.6 

0.0 

115.6 

* 

223.3 

1 

4 

0 

96.1 

34.1 

0.0 

280 

270 

0 

5 

0 

0.0 

24.3 

0.0 

124.3 


124.3 

0 

5 

0 

0.0 

24.3 

0.0 


^ a ni = 60 ret J-/s / u 2 = 120 re q/s p 3 =^ 0 re q/s 

LastMjle_.1_Backbone Network_ A _LastMNeJ 



q \ 40 ms 


a 


h 


30 ms 


/2 


30 ms 


h 


40 ms 



Cb 


(e.g., many individual, slower blade servers). The resulting 
problem becomes mixed-integer and non-linear, results in an 
M/M/k queuing model instead of an M/M/1 model, limits 
the allocation to p resources instead of p sites, and considers 
site capacities in number of available resources with each 
resource having its own processing capacity. Neither problem 
formulations, linearisation approaches, nor evaluation results 
presented here have been published before. 

2 Related Work 

Assignment problems of the form described above have been 
investigated before. We structure their comparison along 
four dimensions relevant to this work: Their model com¬ 
plexity, simplifications reducing tire problem's search space, 
optimization goals, and solution approaches. Additionally, 
related systems of geographical load balancing are compared. 
Finally, related linearisation techniques are reviewed. 

2.1 Model Complexity 

The simplest model considers only the round trip time (RTT) 
when assigning users to cloud resources. They equate 
response time with RTT. Clearly, this is a simplification of 
reality, yet minimizing this average RTT is equivalent to the 
well-known capacitated Facility Location Problem (FLP). If 
the problem is further restricted to only use p resources, it 
becomes a p-median FLP, which is NP-hard [28]. 

A step closer to reality is modelling also the processing 
time (PT) in addition to the RTT. But as long as PT is 
constant, this still stays a Facility Location Problem of the 
type described above. This can be easily seen by extending 
the original network topology by pseudo-links (at the server 
or user side) that represent these processing times via their 
latencies; this is a common rewriting technique for graph- 
based problems (including Facility Location Problems). 

The real challenge occurs when we also consider the 
queuing delay (QD). In this case, the additional time cannot 
be expressed by rewriting the network topology as the QD 
depends on the assignment decisions: A higher utilization 


results in a longer wait, possibly trading off against a shorter 
RTT. 

So far, this more general model has been considered only 
by few works discussed in the remaining of this section, most 
use simpler assumptions than ours (Section 3) rendering the 
problem easier to solve. Vidyarthi et al. [48] allow the same 
degrees of freedom as we do. They approximate, similar 
to us, the non-linear part of the objective function with a 
piece-wise linear function. However, in contrast to our work, 
they used a cutting plane technique which iteratively refines 
the piece-wise function as necessary; it remains unclear how 
large their linearisation error is. In contrast, our evaluation 
shows small linearisation errors; and this is achieved by 
using a simpler technique. 

2.2 Simplifications 

Other authors investigate slightly different scenarios, so that 
their problem formulations are similar, yet simpler than ours. 

Some authors [50], [54] replace the non-linear QD part 
with a constant upper bound and, consequently, the resulting 
problems become simpler to solve. But this also hides QD 
changes as a result of assignment changes. For instance, in 
a situation where load balancing would reduce the QD, 
this reduction is not visible as the QD part is constant. 
Consequently, the resulting solution has further potential 
for optimization - we exploit this potential. 

In another simplification, the assignments are predefined 
by a rule. Some authors [1], [50], [54] always assign requests 
to the nearest cloud resource. In such a case, the problem 
reduces to just finding the best resource location and is 
easier to solve. Tire assignments are then predetermined by 
the rule. However, balancing the assignments could further 
reduce the QD but is not considered. We do not use any 
predefined assignment rule, so we have the freedom to 
change assignments in order to further reduce the response 
times. 

Another group of authors [6], [15] uses a parametrized 
assignment rule called the gravity rule: Weights determine 
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how users are assigned to cloud resources. These config¬ 
urable weights are used to continuously solve the same 
problem with new weights reflecting the resource utilizations 
of the previous solution. This approach does not guarantee to 
converge, so the authors propose a heuristic that attenuates 
the changes in each iteration, enforcing convergence with 
an unknown linearisation error. In contrast, we solve the 
problem in one step by using all information to find the 
global optimum. 

Liu et al. [36], Lin et al. [35], and Goudarzi et al. [21] 
present a similar Facility Location Problem with convex 
costs such as queuing delays or resource's energy costs. 
In contrast to our work, they relax the integer allocation 
decision variable simplifying the problem to the cost of 
a less accurate solution when rounding up the obtained 
continuous allocations. Our goal, in contrast, is to prevent 
unexpected expenses by introducing an upper bound to 
the number of used resources. Continuously relaxing our 
problem can cause any location to be allocated a bit and, 
consequently, any site is used and paid. While the papers 
[35], [36] only consider queuing delays as a cost function, 
this paper discusses a holistic queuing system integration 
and additionally considers splitting and joining (assigning) 
of the arrival process. 

2.3 Optimization Goal 

Existing literature uses queuing delays in FLPs with three 
optimization goals: classic FLP, min /max FLP, and coverage 
FLP 

Classic FLPs are problems that minimize the average 
response time, like our problem (Section 3.2) or others [6], 
[15], [42], [48], [50], [54], allowing RT variations for individual 
users. 

Aboolian et al. [l]'s min/max problem minimizes the 
maximum response time. Intuitively, such problems improve 
especially the users' RT with high RTTs to cloud resources. 
However, if only one such user exists with resources being 
far away, assigning this user will negatively affect the 
assignments of other users: Their assignments are now less- 
restrictedly constrained by a relaxed upper bound and are 
likely worse than without the first user. In contrast, classical 
FLPs do not suffer this way from a worse case user. 

Another type of problem is coverage problems; the user 
assignment's response times is upper bounded [37], [38], 
[39]. Structurally, a coverage problem is a special, simpler 
case of a min/max problem; the first has a predefined 
bound, which is additionally minimized in the second. 
Intuitively, such problems can be applied in scenarios where 
service guarantees for a certain maximal response time will 
be provided and paid. In contrast, classical FLPs allow 
minimizing the average response time below the lowest 
possible response time bound. 

2.4 Solution Approaches 

A couple of heuristics were proposed solving related prob¬ 
lems which are variants of the NP-hard capacitated FLP [23]. 
No work so far used solvers to obtain solutions (for non- 
relaxed problems) and full enumerations are known for small 
instances limited to open five facilities [1]. A greedy dropping 
heuristic successively removes from the set of candidates that 


resource which increases the response time by the smallest 
amount [50]. Greedy adding heuristics successively add 
resources, which decreases the response time by the largest 
amount [1], [6], [15]. Another heuristic probabilistically 
selects set changes of used resources [15] or performs a 
breath-first-search through “neighbouring solutions" where 
two solutions are neighbours if their sets of used resources 
differ in one element [1], [15]. Such heuristics can be stocked 
in local optima and to mitigate this drawback meta-heuristics 
are used as a superstructure [1], [6], [15], [42], [50]. These 
meta-heuristics typically refine previously generated initial 
solutions, which are obtained randomly or by combining 
existing solutions. The hope is that among the found local 
optima, one solution is very close to the global optimum 
- but without any guarantee. In contrast, we obtain global 
optima. This is an important step for heuristic development 
as only this enables a clear judgement of heuristics' accuracy; 
their solution's gap to the global optimum. 

Others [48], [50] may achieve near optimal solutions by 
using optimization techniques like branch-and-bound and 
cutting planes but their solutions have unknown optimality 
gabs. In summary, either optima for small input or solutions 
with unknown optimality gap are obtained. This motivated 
our work on finding near-optimal solutions with a numeri¬ 
cally very small optimality gap. 

Liu et al. [36] and Wendell et al. [51] present distributed 
algorithms for their global Geographical Load Balancing 
problem by decomposing it into separate subproblems solved 
by all clients. These subproblems converge to the optimal 
solution only if they are executed in several synchronized 
rounds in which assignment and utilization information are 
exchanged among all clients. Both papers state that this dis¬ 
tributed algorithms would obtain optimal solution faster than 
gathering everything to a centralised solver. However, we 
believe that each round a communication delay is introduced 
when sending update information among all clients; they had 
ignored this delay in their evaluations. The resulting total 
delay over all rounds is likely larger than communicating 
with a centralised coordinator. In addition, our p -median 
Facility Location Problem has a global constraint on the 
maximal used resources preventing it to be easily separated 
into subproblems. 

We observed that problem instances were solved only 
exemplary so far [1], [6], [15], [35], [36], [38], [42], [42], [51]. 
Consequently, the average performance of these solution 
approaches is hard to predict. We go beyond this by under¬ 
taking a statistical performance evaluation. We randomly 
vary our input data and verify the statistical relevance of our 
findings. 

2.5 Geographical Load Balancing 

A system for Geographical Load Balancing (GLB) comprises 
two parts: The decision part selects appropriate server, sites, 
or Virtual Machines for requests of a certain origin - the 
previous sections considers them. This section focuses on 
the realisation part, which gathers monitoring information 
and implements selections. Different middlewares had been 
proposed [19], [51], [52], [53] which are shared between 
applications. In this way, each application benefits from 
instances of the other application running at diverse sites by 
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sharing monitoring information such as latency to servers 
or to customers. They realise request assignments, e.g., 
to close-by or low utilised server by either configuring 
the Domain Name System (DNS) or are explicitly queried 
ahead a request send. Slightly different, Cardellini et al. [10] 
propose redirecting requests to different sites to balance 
the load. Policies ranges from redirecting all, only largest, 
or only group requests to selecting sites based on round- 
robin, site utilization, or connection properties. Our paper 
focuses on solving the problem and investigates whether the 
more complex problem with queuing systems is worth the 
additional efforts and our results can be applied to improve 
geographical load balancing systems. 

2.6 Linearisation 

Function linearisation is a technique to approximate a non¬ 
linear function over a finite interval by line segments; Sec¬ 
tion 4.1 provides technical details. Applying this technique to 
objective function or constraints, any non-linear optimisation 
problem can be approximated by a linear problem [13], [20]; 
such a transformation is also called problem linearisation. 

We consider solving time and solution quality simulta¬ 
neously, and, hence, need to choose a suitable number of 
segments. Imamoto et al.'s algorithm [25], [26] (improved by 
us [32]) obtains segments' start and end basepoints yielding 
a high linearisation accuracy for convex, univariate functions. 
We use this improved version to obtain basepoints. 

Rebennack et al. [44] linearised multi-variate functions by 
first decomposing them into separate independent functions 
and then recombining the linearisation. This approach is 
limited to separable functions. But our function of interest is 
Erlang-C-based (Section 3.2 Eq. 2), which is not separable. 

Vidyarthi et al. [48] refines the piece-wise linear function 
iteratively by adding basepoints while solving. In contrast, 
we first compute a tight function linearisation which also 
involves modifying existing basepoints. Then, these base- 
points are integrated into the problem to solve. This two-step 
approach is much simpler to implement and to solve than 
changing the search space dynamically during solving. 

3 Problem 

3.1 Scenario 

A service provider deploys a service whose response time is a 
critical metric for service quality, e.g., an interactive service. 
Geographically distributed users request the service. In order 
to reduce the request's round trip time, geographically 
distributed resources are used, e.g., services are deployed in 
Virtual Machines (VMs) utilising compute resources of data 
centres at different sites. Resource performance (e.g. host 
performance executing the VM 3 ) and resource utilisation 
(e.g. VM's CPU utilisation) determine how fast an reply is 
computed. Tire sum of this time and the round trip time is 
the response time of a request - the user's wait time after 
sending a request until receiving its reply - shorter user wait 
means better service quality. 

3. In order to rely on performance indicators in a virtualised envi¬ 
ronment, the service need to use the host hardware, e.g. one CPU, 
exclusively. Otherwise, multiple VMs share the same hardware and the 
used performance indicator no more describes the real performance to 
expect. Then the theoretical response time computed by the optimisation 
largely deviates from the observable response time. 


3.2 Model 

The scenario (Section 1) is cast as a capacitated p- median 
Facility Location Problem [16]. A bipartite graph G has two 
types of nodes: Clients (cGC) and facilities Clients 

correspond to sites where user requests enter the network. 
Facilities represent candidate sites with available compute 
resourcessection 1.1 on which a service can be executed, e.g. 
hosts of data centres. Figure 4 (p. 7) shows such a graph. The 
round trip time l c f is the time to send data from c to / and 
back. 

Tire geographically 4 distributed demand is modelled by 
the request arrival rate A c for each client c. Each facility / 
has kf resources available 5 and each resource can process 
requests at service rate pp - the resource capacity. 

A facility consists only of homogeneous resources. Het¬ 
erogeneous resources at one facility f can easily be approx¬ 
imated by partitioning these resources into homogeneous, 
disjunct resource sets, each represented by a dedicated facility 
fi, fi, ■■■', Vc: lcf=lcf 1 =lcf 2 •••; this introduces a small 
inaccuracy by modelling multiple queues where one queue 
would be the exact model. Table 2 lists all variables. 

The expected time for computing an answer is obtained 
by utilising a queuing system at each facility, with the 
usual assumptions: The service times are exponentially 
distributed and independent. The request arrivals at each 
client c are described by a Poisson process; client requests 
can be assigned to different facilities. At one facility, requests 
arrive from different clients and the resulting arrival process 
is also a Poisson process, because splitting and joining a 
Poisson process results in a Poisson process. Therefore, we 
have an M/M/y-queuing model at each facility. 

Having this model at hand, the probability that an 
arriving request gets queued is computed by the Erlang-C for¬ 
mula EC given in, e.g., Eq. (2) of [7], Each facility has to be in 
steady state with resource utilisation p = x / yn = a / y < l . Derived 
from EC, the expected number of requests in the system (NiS) is 
INI (a, y) (3). Similarly, the expected time a request spends in the 
system (TiS) is T p (A, y) (4). 

ya v 

EC (a, y)= yaV n , (Erlang-C) (2) 

y \ ( y — a ) ’ 0 i \ 

N(a, V) = 7 a T EC(a, y) + a (NiS) (3) 

{y -a) 

t m (A, y)= yN(V^, y)= EC( V*M/) + l (TiS) (4) 

A y\x - A p 

3.3 Formulation 

Two decisions have to be made: (1) Distributing a client c's 
request rate A c to one or facilities /, each of which gets a 
request rate x c f and (2) allocating yq resources at f. Both 
influence the queuing delay non-linearly. - a well-known 
fact later recapitulated in ?? The mutual dependency between 
both decisions causes the complexity of our problem. The 

4. More precisely, the request arrival and service sites are topologically 
distributed; the round trip time of a path between two sites only 
roughly matches its geographically distance. We use "geographical" 
as an intuitive shorthand. 

5. For example, assuming one resource corresponds to a fixed-size 
Virtual Machines occupying two CPU cores, a data centre with 30 
physical hosts with 4 cores has 60 resources available. 



Table 2: Model variables 
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Input: 

G = (V, E, l, 

lef £ K>o 
A 1 / £ «>o 
k f £ N> 0 
A c £ K>o 

OLi\ 

p = T, f Vf 


Bipartite Graph with V = CUF,CC\F = 0 with 
client nodes c S C and facility nodes / e F 

Round trip time between c and / 

Request service rate as resource capacity at / 
Number of servers available at / 

Request arrival rate as demand at c 
i-th basepoint g(ai)=/3i of PWL function g 
Limit on maximal resources to allocation 


Decision variables: 

x c f £ R>o Assignment, request rate 

yf £ N>o Number of allocated resources (active servers) at 


/ 


iff ; yfj£{ 0, 1} Indicator: / open; with active PWL func. 
Zfi ; Zfji&[ 0,1] Weight of i-th basepoint of PWL func. 
hfji', h l fji£ Indicator: (j, j)-th triangle at / 

{ 0 , 1 } 


Notation shorthand: k or fc* refers to the tuple/vector (ki ,.., ki F i); 
and x c * refers to matrix slice (x c i,a; c |_F|). 


Optimisation Problem 1 QP(G, p, T*), the reference 


E c / X c/Q/ E/(E C Xc /) ~^l*f (Sc X e/, yf ) 


K <V E c Ac 


Ec E 


(5) 


avg. RTT 

x cf = e 

/ 

Xc f < yfyf 

Vc 

avg. time in system 

(demand) 

(6) 

v/ 

(capacity) 

(7) 

Vf < k f 

V/ 

(count) 

(8) 

*5 

II 

^3 


(limit) 

(9) 


formulation QP (Problem 1) extends the /t-median Facility 
Location Problem P [32] by having additional costs, the 
queuing delays, at each facility. Constraint (6) ensures that 
each client's demand is assigned. The assignments to each 
facility /, J2 C x cf> must not exceed f's service rate yfPf (7), 
which is the service rate per resource /i/ times the number 
of allocated resources yf ; this constraint also ensures that f's 
queue is always in steady state. The allocated resources yf 
do not exceed the local (8) and global (9) limit. 


4 Linearisation 

The problem QP is non-linear and could be solved by a 
non-linear solver. In previous work [32], a simpler, yet 
also non-linear version of this problem was successfully 
linearised and was solved by a linear solver fast and without 
substantial quality loss. Because of these encouraging result, 
we also follow the linearisation approach in this paper. Q P 
is more complex than our problem from the previous paper: 
While both problems decide the request assignments, the 
problem QP additionally decides the number of allocated 
resources at each site. This turns T ; , from the univariate 
function T M (A) into the bivariate function T M (A, y). In 


/Yy 


Figure 2: 1D-PWL example. Figure 3: 2D-PWL example. 

addition, the second parameter is integer, making it difficult 
to apply standard linearisation formulations. 

The remaining section describes the linearisation tech¬ 
nique in general (Section 4.1) and reformulates QP by 
linearising either several curves separately (Section 4.2, 
Section 4.3) or together as a surface (Section 4.4, Section 4.5, 
Section 4.6). 

Then, ?? further discusses potential problem simplifica¬ 
tions. Finally, Section 4.7 presents technical details on how 
Erlang-C-base functions are efficiently implemented. 

4.1 Piece-wise linear functions 

Any non-linear, univariate function g can be approximated 
over a finite interval by multiple line segments. A func¬ 
tion composed of such segments is called a piece-wise 
linear (PWL) function [20] (strictly speaking, a piece-wise 
affine function). We refer to the linearisation of g as g. 
As an example, Figure 2 shows g(x)= sin(x) and two 
segments linearising g{x). For m basepoints with coordinates 
(cci, Pi=g(on)), 0 <i<m, a continuous PWL function can be 
defined (10) by linearly interpolating between two adjacent 
basepoints (black circles in Figure 2). The interval is implicitly 
defined by the outer basepoints, [«o, a m _i]. 

{ (ai-a 0 )(x—a 0 ) + if a 0 <x<ai 

(a 2 — ai)(x— ai) + /3i if a\<x<a 2 (10) 

The linearisation accuracy e is measured by the maximal 
difference e between g and g, e= max^ | g{x) — g{x) \. Because 
the PWL function definition in (10) is not directly understood 
by an MILP solver, it has to be transformed. We use 
the convex-combination method [14] to model continuous, 
univariate PWL functions: The cases of (10) are substituted 
by a convex combination of the basepoint coordinates (11) 
with weights Zj. 

y=g{x) ^ ~^ZiOa=x , ^ ^ZiPi=y , y~Ej=l, SOS2(.z*) (11) 

i i i 

In addition, at most two adjacent basepoint weights 
are allowed to be non-zero, Zi, Zi + 1>0, Zj= 0, j<i V i+l<j. 
Most optimisation solvers allow to specify such restrictions 
as special order sets 6 ; with such additional information a 
solver's branch and bound methods can explore the search 
space more efficiently. With this restriction, coordinates 
could be expressed that are not located on line segments, e.g., 
the cross in Figure 2. 

Similar to univariate functions, bivariate functions 
g(x,y)=w can be linearised by triangles instead of seg¬ 
ments (Figure 3). Then, the convex combination is extended 
by one parameter (12) and weights for at most three vertices 

6. n-Special Ordered Sets SOS n(V) are special constraints limiting 
decisions variables of a list V so that at most n adjacent variables are 
positive and non-zero having the remaining variables zero. Notation: 
SOS2(z„) is a shorthand for SOS2(zi,.., z m ) (Table 2). 
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Figure 4: Bipartite graph of a Facility Location Problem with 
queuing systems at each facility (a) and zoomed bivariate 
time in system variants: exact (b), curve-based (c) and surface- 
based (d). 

of a single triangle are non-zero instead of the two segment 
vertices in the univariate version [20] (two vertices of a 
segment correspond to two adjacent basepoints being non¬ 
zero). 

w=g(x , y) <£> ^2 ZiOti=x, ^ Zipi=y, 

i i 

y ^Zj0i=w, y~]zj=l, SOS2(z») (12) 


Optimisation Problem 2 cQP3(G, p, T** 
separate curves, cubic 

) 



time in system / 


^ £/(E c*. 

c /) ^2 yfj 

Yj Z f i @fj i 


. Ec/ X cflcf | 

3 

i 

n 

mm ™ x + 

x *y> z 2^c X c 

1lj C 



s.t. 'y ^ %cf — 

f 

Vc 

(demand) 

(14) 

j 

C i i 

(capacity) 

(15) 

V, Zfi = 1, SOS2 (zff) 

i 

v/ 

(weights) 

(16) 

3Vfi ^ k f 

j 

V/ 

(count) 

(17) 

Y.3Vfj=P 


(limit) 

(18) 

fj 




SOS1 (yf t ) 

V/ 

(curve flip) 

(19) 


The particular challenge here is to deal with an objective 
function having bivariate cost functions with one integer 
parameter y. The following two sections present different 
linearisation strategies for this challenge. 

4.2 Curves 

Linearising Q P boils down to linearising the non-linear part 
T ;J (A. y) (4) of the objective function (5). But linearising 
T ( , is not obvious as its second parameter is integer. So, 
T ( , is reformulated as kf separate univariate functions 
T/id W=T/i(A, j ), 1 <j<kf (Figure 4b). We use here the 
notation f a {x) to indicate that / is a function with constants 
a and variables x, e.g. f$(x), fz(x) are two different functions 
of x. If exactly y resources are allocated at a facility, function 
T x.j computes the corresponding time in system. Linearising 
all these functions (Figure 4c) also obtains a linearisation of 
the mixed-integer function (4). 

Each of these functions is convex, so we can use our 
algorithm [32] to obtain corresponding linearisations T ; , ? 
with high linearisation accuracy with few basepoints. As¬ 
suming m basepoints are used for one function, one facility 
needs m • n basepoints, mkf. 

The problem allows that facilities have different service 
rates //. Linearising T (I J for different // needs different 
basepoints for high linearisation accuracy. Consequently, in 
total basepoints 7 exists. The coordinates of a single 

basepoint (a/ji, Pfji) belong to facility / with f&F, 1 <j<n, 
1 <i<m. They jointly define the approximation function 

We rewrite the original problem formulation. First, we 
reformulate the integer variable yj of decision problem 
QP into a vector of binary decision variables yfj, each 
one representing that facility / uses j resources if and 
only if yfj= 1. Exactly one of yfj is 1 (V/: SOS1 (y/*) 8 ), 
which means that yf=Jfjj Vfj■ Technically, yfj is used 
to select which time in system function T ;i/ j for a specific 

7. Basepoint coordinates can be shared among facilities with same 
service rates. If all service rates are the same, only mmaxfkf different 
basepoints exists. 


number j of allocated reources is used; more formally, 

T ys (E k ) = £j=l A). 

Second, continuous weights z /, are used to express the 
convex combination of basepoints (Section 4.1), representing 
the utilisation and the corresponding TiS at each facility. 
For each facility /, we need m weights since only a single 
function T ;J/ j is selected by the decision variable ijfj 
(and not mkf weights, one weight per basepoint at each 
facility, as one might suspect). PWL function T M/ j (A) is then 
formulated as JA z fiPfji with A f as JA ZfjOifjj. 

In the resulting problem formulation cQP3 9 (Problem 2), 
QP's objective function (5) is transformed as described above, 
replacing (5) by (13) and by (19). The new constraint (16) 
ensures a convex combination only of neighbouring weights. 
The capacity constraint (15) replaces (7). The local and global 
limits are adjusted using y; constraints (8), (9) are replaced 
by (17), (18). 

The right term of the objective function (13) multiplies 
three decision variables, xyz, making the problem cubic. 
To transform it into a linear problem, at first, the factor 
of cQP3's time-in-system function (13) is integrated in the 
basepoint coordinates. To understand this modification, QP's 
objective function (5) has to be revisited. In this function, the 
time-in-system function T /( has the factor V r . x c f, in short 
u. This term of the objective function can be reformulated 
as t/T^(w,?/). As u is also a parameter of T ;i , the term 
also equals N( u /y,y) (N defined in (3)). This term has two 
advantages: Without having a factor, the objective function 
becomes simpler (from (13) to (22)). In contrast to T ;i , N 
is independent of // and basepoint coordinates need only 
computed once and not for each different //. 


min 

x,y 


1 

J2 C ^ c 


5Z X cflcf + 
cf 


1 

J2 C 


s w 


> vs) 


( 20 ) 


8. Notation: Shorthand j/y* for yf i,.., yf n (Table 2) 

9. We use the following naming convention for optimisation problems: 
"3" indicates the cubic nature of this problem; the prefix "c" indicates 
the curve-based linearisation of the time-on-system function, later 
to be complemented by "t" for triangular formulations and "q" for 
quadrilateral formulations. 





Optimisation Problem 3 cQPl(G, p, N*) 
separate curves, linear 


total time in systems 


2^cf X cflcf fji 


x ’y< z Ec^ 

S.t. ^ ' x cf — A 
/ 


E c E 


Vc 


X cf<L l f Z fji a ji 

c ji 

J2 z fji = 1, SOS2 (z fj *)Vf,j 

i 

^2 Z fji = 2//j 

i 

Constraints (17), (18), (19) 


(demand) 

(capacity) 

(weights) 

(sync) 


(24) 

(25) 

(26) 

(27) 

(28) 


To linearise N, N ('“/;//. y) is split into func¬ 
tions Nj( u / l i f )=N( u /iJ,f,j) separately being linearised 
(similar to T M ). The inner function u /n s of the abscissa 
is resolved by modifying the abscissa coordinates of the 
linearisation basepoints. The resulting convex combination 
is defined by (21) (with weights Zi). 


N(“//.) 


f E< 

\E ,4 


u /li 4=> H J2i ZiO-i =U 

N (“//») 


,«=£*<=/ (21) 
C 


Replacing cQP3's object function (13) with (22) and 
capacity constraint (15) with (23) yields optimisation problem 
cQP2 (now quadratic, hence a "2"). 


min 

x,y,z 


J2cf x cflcf E/j Vfj Ei ZfiPji 

E c E E c E 


( 22 ) 


£*c/</7/£y/j£ajiZ/iV/ (capacity) (23) 

c j i 


In consequence, cQP2 needs fewer basepoints than cQP3, 
because Nj depends only on the number m of allocated 
resources j but is now independent of //; in total only m ■ 
max/ kf basepoints are necessary. 

The objective function of cQP2 (22) is quadratic ( yz ). We 
further simplified the problem by two modifications to derive 
a linear problem formulation: Firstly, more weights z are 
used; previously only m weights per facility f are modelled 
whereas now inn weights are modelled: Zf ]t for facility /, 
Nj, basepoint i. Doing so yields an equivalent problem with 
increased search space. Secondly, y is turned into a constraint 
enforcing that only those weights Zfji are non-zero which 
correspond to Nj being active at facility /, y f :j . Several test 
runs showed that the resulting linear problem is solved faster 
than its quadratic counterpart despite its larger search space. 

The resulting linear formulation cQPl (Problem 3) has a 
new constraint (28), ensuring that only the relevant weights 
are allowed to be non-zero. Having this constraint in place, 
the new objective function (24) equals the old objective 
function (22). Constraints (26), (27) now support the new j 
index for weights z. In this way, the new linear problem cQPl 
computes the same solution as the previous quadratic 
problem cQP2. 




Figure 5: Contour plot for N ; (a) with basepoints for 
j€J={l, 2,4,8,16, 32,40} of Nj(o, y). 

4.3 Thinned Curves 

Problem cQPl uses mn basepoints at each facility, in being 
chosen and n=kf. For example, for 100 facilities, each with 
40 resources available, cQPl has n=40 separate functions Nj 
at each facility. When using m=10 basepoints for each PWL 
function Nj, the search space contains 40,000 weight decision 
variables. The search space can be reduced by reducing 
m or n, but this lowers accuracy. Reducing the number 
of basepoints is a straightforward trade-off of accuracy 
against search space size, but simply reducing the number 
of resources n is not adequate as this modifies the problem 
instance. Hence, to obtain a similar trade-off for the resources, 
we need to find a way to reduce the amount of resources to 
look at. One option would be to allow gaps in the sequence 
of number of allocated resources, with appropriate rounding, 
J=[l,kf\. For example, with 40 available resources at a 
facility, we could remove the option to use, say, 26 resources, 
forcing the solution to either use 25 or 27 resources instead; 
this turns the interval J into an explicitly enumerated set: 
J={1,.., 25, 27,.., kf}. In the example with 100 facilities and 
?n=10 basepoints this removes 1000 decision variables. The 
set J specifies which options j£j for the number of allocated 
resources at one facility are available. As an example, Figure 5 
shows N(a, j)=t as contour lines. Basepoints are plotted 
as crosses for Nfc(a), j£j = {1,2,4,8,16,32,40} with 
| J\=n=7, ?n=6. With |F|=100 facilities, only |F|mn=4200 
weight decision variables are needed as opposed to 40,000 
variables with 77=40 and 777=10. Each PWL function Nj can 
be seen as a separate curve and thinning the sequence of 
available functions can be seen as thinning the list of all 
curves - these two perceptions gave the name for these 
formulation techniques. 

Problem cQPl (Problem 4) is adjusted by adding the set J 
as parameter and replacing the constraints (17), (18) by (29), 
(30). By dropping a particular j from J, two issues can arise: 
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Optimisation Problem 4 cQP(G, p, N,. J), thinned curves 
Obj. func./Constr. (24),(19),(25),(26),(27),(28) 


X! j yfj < k f v / 

(count) 

(29) 

16 J 



Y. i yfj - p 

(limit) 

(30) 

f,jeJ 




a) In special cases the problem becomes infeasible (as the 
options to represent a solution are reduced), b) The number 
of allocated resources J, t//= Yjejj Vfj cannot satisfy 
] f Uf=P', this is the reason why Constraint (30) is relaxed to 
an upper bound. 

To illustrate both issues, consider the following example: 
Resources should be allocated at 6 facilities, deciding values 
for 2/i..6- The facilities have kf =30 resources available and 
all resources are homogeneous. Assume that the demand is 

handled by p=18 resources and that at least 15 resources 
y' a ~ 

are needed, c ]=15. Basepoints exist for VjGJ : Nj(a). 

Starting with all resource allocations available, <7= [1,30], 
the optimal allocation is yi.s = 3 € J. A thinned J=[l, 30] 
renders cQP infeasible. Allocating 30 resources at one site 
violates the upper limit, 30^18=p. Similarly, allocating 
one resource at each facility violates the limit, (if 1 8=p. 
Removing fewer j from J reduces the likeliness of this special 
case. 

To illustrate issue b) from above, consider cQP with 
J={1,10, 20, 30}. Under constraint (30), this problem is fea¬ 
sible, e.g., with allocation t/i..6=10,1,1,1,1,1 which satisfies 
the limit YfVf~^ — p=18 and also satisfies all demand (it 
would not be feasible if constraint (30) stipulated equality). 

However, the downside is that not all p resources are 
used and queuing delay could be reduced further since 
adding a resource at any facility always reduces that facility's 
queuing delay. So adding the remaining resources, Yf Vf~P> 
will always improve the solution. The algorithms ALLOC 
and MaxCostDrop increase resources at those facilities 
where the queuing delay is reduced the most. They allocate 
the remaining resources in a way that optimally improves 
the solution. The auxiliary algorithm ALLOC merely trans¬ 
forms many QP variables into a structure suitable for the 
generic greedy algorithm MaxCostDrop. Both algorithm 
are presented here in a more generic form as both are also 
used later. MaxCostDrop considers facilities as buckets (/) 
and allocating resources as placing tokens (y'f) into these 
buckets; n=Yf Vf~P times. The queuing delay drop of 
allocating more resources ( N(af,yf+y'j )) is described by 
a decreasing, convex cost function ( Cf(y'j )), Algorithm 1 
Line 4. MaxCostDrop successively places one token at 
the bucket with the resulting, largest cost drop at this step. 
Distributing tokens to buckets in this way maximises the 
total cost and tire queuing delay drop (Theorem 1). 

Theorem 1. MAXCOSTDROP(n, c*(y), y max ) (Algorithm 2) 
maximises Yf c f(Vf) ensuring V6:y/ < yf x , Yf Vf— n f° r 
any decreasing and convex cost function Cf(x). 

Proof. For a weighted matroid M=(S, I), algorithm GREEDY 
(from [12], Theorem 16.10, p. 348ff) computes a subsetA 
with maximal weight. Line 1 defines a weighted ma¬ 
troid M=(S,I ): S is non-empty; I is hereditary meaning 


Algorithm 1 ALLOC(p, A', y= 0, F’=F) -A y,T 

Optimal allocation for known assignment. 

requires \' f <k f p f , Y / <P 

ensures A}<2//P/, E/ Vf=P 

minimises Yf N (■ A //W, yf) 

1: V/eF' : a/«— A //^/> ^niax}!//, [a/]} 

2 : n -s— p— Y f yf in 

3: if n> 0 then 

4: y' A- MaxCostDrop}™, c(j), y max ) 

v/ : Cf(j) := N {a f ,y f +j) 

V/ : yT x kf-yf 
5: V/:„4 r' +! /, 

6 : else 

7 : Vf:y f = yf 1 

8: return y, Yf N( a /> Vf) 

yf: minimal necessary allocation at / 

ymax. rema i n i n g resources usable by MaxCostDrop 

c f (j): weighted time in system at / with j 


Algorithm 2 MAXCoSTDROP(n, c(j), y max ) —t y 

Drops n tokens in m buckets while minimising total drop 
costs under bucket capacity constraint. 

requires V/: Cf(x) is decreasing and convex 
ensures V/: yf<yf x , Y / Vf=n, max Y / c f{Vf) 
l: S -e- {j//j | feF, l<j<yf x } 

I<r- {5 , CS , |n>|5"|} 

w (yfj) : = c fti) - c f(j~ i) if v> 1 else c/(o) 

2: A <-0 

3: for yf j G S, sorted by non-increasing weights do 
4: if \A\ < n then \A\ •<— A U {yfj} 

5: V/: y f <- | {yfj&A | l<j<yf x }\ 

6: return y 

yfj: the j-th token placed in /; Wfji the cost for placing yfj 

WBgI, ACB : A G /i; M satisfies the exchange property 
meaning VA, J5eJ, |A|<|B| : xGB—A A Au{y}e/; w(y) is 
positive. The lines (2)-(4) are Corman's Greedy casted to 
our M. By adding yfj to subset A, one token is added to 
bucket /; so the number of tokens yf in each bucket can be 
aggregated as in line (5). In particular, the following property 
holds: V/, j'.yfj was added before yf(j+ 1 ). This is ensured by 
the weight sorting (line (3)) and Cf (y) being decreasing and 
convex; then V/, j: vj f 3 >iu /(;+i)• From the same property 
the costs can be derived as V/: Cf(yf)=Y Vf eA w (yfj)- 
From Theorem 16.10, the computed subset AcS maximises 
Y Vfj& A w (yfj ) which also maximises c f (y f ). □ 

Lemma 1. Alloc(G, A', p) (Algorithm 1) allocates p re¬ 
sources at facilities f£F' so that Y / N( a //m/, Vf)=T is 
minimised while ensuring A}<yy/iy and Y f yf = P- 

Proof. Line (1) allocates the resources yf at least neces¬ 
sary to handle assigned demand A} having n>0 resources 
still to be allocated (n< 0 contradicts A}'s requirements). 
With n= 0 the computed minimal allocation is the only 
one and, hence, T is minimal; done. With n> 0 greedy 
algorithm MaxCostDrop is used, where placing tokens 
into buckets corresponds to increasing resource at facilities. 
The token cost function c/ (j ) and N a ( j ) is decreasing and 
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convex in j and y™ ax specifies the capacity of bucket /. 
Placing n tokens in buckets maximises the total token 
cost J2f c f(Uf ) (Theorem 1). The token cost corresponds 
to the reduction, V/: N a (yf m )-N a (yf m +y'l) = Cf(y'j), 
incurred by allocating y” additional resources at /. With 
minimal allocation ( y mm , line (1)) T is maximal. Adding 
resources with maximal reduction of token cost also maxi¬ 
mally reduces T. Consequently, the resulting T is minimal; 
done. □ 

4.4 Surface with Triangles 

As discussed in the previous section, dropping separate 
curves from J jeopardizes feasibility. To overcome this issue, 
the previously separate univariate PWL functions are now 
joined into one bivariate PWL function. This is done by 
creating a mesh of triangles over all basepoints of all curves 
in J (Section 4.1); such a mesh defines also the surface of the 
linearisation (Figure 4d). Doing so, the problem of reduced 
allocation options (thinned curves) disappears as they are 
implicitly represented by interpolating between basepoints 
of neighbouring PWL functions. Hence, the set of basepoints 
can be thinned out more aggressively to further reduce the 
search space without jeopardising feasibility. 

This section discusses three questions: 1) Using neighour- 
ing functions' basepoints for interpolation introduces inac¬ 
curacy - how large is it? 2) Modelling triangles is more 
complex than modelling line segments - will a smaller 
search space compensate for a more complex optimisation 
problem? 3) The convex combination formulation introduces 
continuous weights, rendering y f continuous - how to treat 
fractional server allocations? 

The linearisation surface S (the mesh of triangles be¬ 
tween basepoints of sequence J) linearly approximates 
the surface S of the original function g, which is in our 
case N(a,y); Figure 5 shows its contour for l<y<40 and 
corresponding basepoints of an example J. The triangle mesh 
(Figure 6a) has mn basepoints. As before, the basepoints aji, 
fiji, Oji touch the surface at g(ctji, (3ji)=0ji, 0 <j<n, 0 <i<m. 
Using the convex combination MILP formulation (31), each 
basepoint has a weight Zji as its decision variable. Only 
the three edges of exactly one triangle form the convex 
combination (SOS3) [13], [20]. This is naturally expressed 
by SOS3, but these are not supported by current solvers. 
D'Ambrosio et al. [13] presented an equivalent formula¬ 
tion (32) only using SOS1 constraints: A new auxiliary, 
binary decision variable h for each triangle indicates whether 
the triangle is active, only one h and its triangle weights 
are allowed to be non-zero while all other weights are 
forced to zero. The formulation (32) corresponds to triangle 
enumeration and orientation in Figure 6a. Integrating the 
triangle formulation from (31), (32) into cQP results in tQR_ 
(Problem 5). 

djiZji=x, ypjiZji=y,'Y^e j iZji=w,w = g{x,y), 

ji ji ji 

SOS3(z.»), = 1 ( 31 ) 

ij 

y, hji+h l ji=l, SOSl(/i**), h* 0 t =Ko=h* ma ,=hl n =Q 

ji 

Vji : Zji < hj i +hj i +h i+1 -\-hi + i-\-h i+1 +h i (32) 

j j + l 3 +1 3 + 1 

While D'Ambroisio et al. focus on tight approximation us¬ 
ing many basepoints, we additionally want to reduce solving 


Optimisation Problem 5 tQP_(G, p. N), with triangles 


y' \ ^^ x cflcf+ y, ^ S ^j Z fji®ji (33) 

z,h ^ c ° cf ^ c c fji 

' -v-" 

sum of all TiS 

,t. 5> c f = A c Vc (demand) (34) 

/ 


Z! Xc f- m 

c 

y Z fji° i ji 
ji 

V/ 

(capacity) 

(35) 

V— 

capacity at / 




Z fjiPji 

ji 

< k f 

V/ 

(count) 

(36) 

# server used at j 





Z z fi’ i ^o i 

= p 


(limit) 

(37) 

j j 





II 

V/ 

(open) 

(38) 

II 

Vf 

V/ 

(o-sync) 

(39) 


j’i’ 


y hf jH , + h l fjH , - 1, 


3 v 

SOSl (/$„) 

V/ (single-tri) 

(40) 

fr/0*=fr/*0=^/m* =/l /*n=° 

V/ (tri-corner) 

(41) 

z ji < hfji + h l fji + /' / ,, . ; 

3" hf j+ 1 1 + 


h u -L h l 

n fj+li+l ' n f j + li 

Mji (tri-sync) 

(42) 


with 0 <j'<n, 0 <i'<m 


time. This depends on the number of used decision variables 
and basepoints. So in a nutshell, we aim for large triangles 
with high linearisation accuracy; the accuracy is, as usual, 
the maximal difference between the original surface S and 
the triangle's surface S. This is facilitated by choosing good 
basepoints, and the remaining section discusses two further 
options: changing triangle orientation (this subsection) and 
replacing the triangles with quadrilaterals (Section 4.5). 

The resulting, triangle-based optimisation problem has 
four groups of decision variables: a) The request assignment 
x c f, b) binary variable ;<// activates facility / and considers 
time in system only for active facilities; c) the weights Zfji 
of the linearised surfaces at each facility /; d) the auxiliary 
variables hj^, hj rl for the upper and lower triangles ensure 
that weights Zfj t are non-zero if exactly one triangle at each 
facility is active. 

As a variant, the triangle direction can be flipped 
(Figure 6b). The adjusted formulation tQP + (G, p, l\l) uses 
objective function and constraints from (33)—(41) and replaces 
constraint (42) by (43). 

Zji < hfji + hf j i+1 + hf ji _|_i + hf j + ij + i+ 

hf j+li + h l fj+li Mji (tri-sync) (43) 

Comparing the linearisations of both triangle orienta¬ 
tions, the resulting surfaces S + , S~ are obviously different. 
Zooming into two adjacent triangles. Figure 7 shows four 
example basepoints and the two triangles of each of the two 
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Figure 6: Surface linearisations with MILP formulation relationship; a) tQP_, b) tQP +/ and c) qQP. 


a 

ica 


II 


3 


V 


x 

Figure 7: Triangle surface be¬ 
tween four basepoints. 

orientations. The three visible triangles are distinguished 
by different shades of grey (the fourth one is hidden in 
the back). Two triangles connected by the dashed diagonals 
either form an upper S' + or lower surface S~, depending 
on the orientation and on the differences of the basepoint 
coordinates. 

All basepoints lie on the original surface S. At any other 
points, at a given ( x,y ) coordinate, there are two points 
(. x , y,w) € S (the original point) and (x, y,w ) £ S (its 
linearisation); typically, w ^ w, inducing some linearisation 
inaccuracy. In general, this inaccuracy depends on the 
triangle orientation, the number of used basepoints, and 
the basepoint coordinates itself. Taking this fact into account, 
Section 6.1 discusses the generation of basepoint sets while 
maximising accuracy. 



This results in the following issue: For given x, y- 
coordinates, there are infinitely many solutions for w- 
coordinates admissible under equations (45). For a geometric 
illustration. Figure 8 shows four basepoints. For a fixed 
(xi,yi), the admissible values for w are visualised by a verti¬ 
cal grey line. This line intersects with the basepoints' convex 
hull at (xi,yi,Wi) and (aq, yi, Wz), limiting admissible w to 
the interval [uq , ic 2 ] ■ 

In general, this formulation is difficult to integrate into 
an MILP because w is not unique. However, for problems 
minimising w, the optimum is wq; maximising w results in 
w 2 . For such problems, w values are unique. And indeed, the 
relevant term of our objective function (33) Yf Yji 
is to be minimised. Hence, the quadrilateral formulation is 
applicable. 

The resulting problem formulation qQP(G, p. N) has 
objective function and constraints from (33)—(41) and replaces 
constraints (42) by (46). 

Zji < hfji + hfj i+ 1 + 

h f j+li+1 + h.fj+u \/ij (rec-sync) (46) 

The advantage of this approach is that it needs fewer decision 
variables: We need only half the number of quadrilaterals to 
cover an area than triangles (Figure 6), and each triangle or 
each quadrilateral needs its own decision variable to control 
whether its basepoints contribute to the convex combination. 
Moreover, the constraints in the quadrilateral case are simpler 
than in the triangle case - compare (42) vs. (46). 




Figure 8: Quadrilateral sur¬ 
face; convex hull of convexly 
combining four basepoints. 


4.5 Surface with Quadrilaterals 

Triangle-based linearisations can be found in the literature. 
We explore here an alternative, namely a linearisation where 
the basepoints form quadrilaterals rather than triangles. The 
hope is again to reduce the search space. 

For linearising a function g(x, y)=w a single triangle can 
be described by three basepoints and all points within the 
triangle by convexly combining the basepoints (44). Similarly, 
the four basepoints of a quadrilateral can be used (45). 
However, while equations (44) form a linear system having 
a unique solution, the corresponding equations (45) form an 
underdetermined linear system having more unknowns ( Zi ) 
than equations. 

222 
Y ajZj~x, Y PiZj=y, Y OjZi=w (44) 

i=0 i=0 i =0 

3 3 3 

Y j \iZi=x,Y j PiZi=y,Y j 9 i z i =w (45) 

i=0 i =0 i =0 


4.6 Post-processing Surface Results 

As the final building block, the algorithm SEARCH (Algo¬ 
rithm 3) overcomes the inherent drawback of all surface 
linearisations: Because linear interpolation along J is al¬ 
lowed, allocations obtained by tQP_, tQP + , or qQP are not 
necessarily integer values. However, our resources are not 
splittable, e.g., number of VM instances, thus allocations 
have to be rounded. Rounding allocations down could result 
in overutilised resources and infeasible solutions. Hence, 
fractional allocations have to be rounded up, potentially 
resulting in allocating more resources than allowed, violating 
Y f VS —P- Hence, we cannot just round up all allocations. 
Could it be possible to round up only some of these alloca¬ 
tions and round down some others? SEARCH answers this 
question. If this is not possible, another solution is obtained 
with smaller limit p'. This way, the algorithm conceptually 
tries all p'£\p\, ..,p] where pj. is the smallest number of 
resources at which the problem is still feasible. 
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Algorithm 3 Search(G, xQP, p, l\l) —> x, y) 

Testing p' < p to obtain integer y. 

1: p' p 

2: while p' < p do 

3: if p' is considered the first time then 

4: feasible, x, y 4— xQP(G,p', N) 

5: if feasible then 

6: V/ : y f <- \y f ], \’ f = E c x cf 

7: A Yf Vf — P // valid solution for A<0 

8: if A = 0 then return x, y // direct hit 

9: if A < 0 then / / add remaining resources 

10: y «- ALLOC(p, A', y) 

11: return x, y 

12: if A > 0 then / / too many resources 

13: y «- DeAlloc(p, A', y) 

14: if DeAlloc(-) was feasible then 

15: return x, y 

16: else 

17: decrease p' by A 

18: else / / infeasible with current p' 

19: increase p' by 1 

20: else lip 1 was considered in previous loop iterations 

21: increase p' by 1 

22: return SEARCH is infeasible 


In detail, SEARCH is a bit more complicated. While it 
would be correct to iterate over the sequence p,p— 1,... ,Pf, 
this has unacceptable runtime, in particular owing to the 
frequent calls to solve xQP. But solutions for any p' and 
p' — 1 are most likely to be very similar anyway: If any p' 
had inappropriate assignments, most likely p'—l's solution 
is similarly inappropriate; so we could skip p— 1 to save 
runtime. Hence, SEARCH modifies p' in larger steps, trying 
to find a suitable solution more quickly. The intuition for the 
step size is the number of excess allocations due to rounding 
(see second-last paragraph in this subsection). 

Search invokes tQP_, tQP + , or qQP. The parameter xQP 
points to the function solving the concrete problem, e.g. 
tQP . Solving xQP results in one of three cases: a) Ex¬ 
actly p resources are allocated; the algorithm is done, 
b) Fewer resources than requested are allocated (A<0), 
then ALLOC (Algorithm 1) distributes the remaining |A| 
resources; the algorithm is done, c) More resources than 
requested are allocated (A>0), then DeAlloc removes A 
resources; DeAlloc (Algorithm 4) is basically a twin of 
Alloc removing resources one by one; details on DEALLOC 
are provided laterthis paper's extended version provides 
details on DeAlloc. Often, removing all A resources will 
not be feasible because the current assignment distributes 
the demand improperly among the facilities. Then, only 
adjusting the assignments itself can help. This is done by 
reinvoking xQP with a smaller p'. The resulting solution is 
processed as the original limit p in one of the three cases. 
The basic idea is to reduce p' and find a good assignment for 
which the allocation can be adjusted to use p resources. At 
some point p' is so small that xQP becomes infeasible. 

The algorithm SEARCH finishes in two cases: i) The 
allocation matches (a) or could be fixed (b,c). ii) After 
considering all p'&\pi, ..,p] without success, the algorithm 


Algorithm 4 DeAlloc (p, A', y=kf) —ty,T 
Optimally reduce over-allocation, specified in vector y. 

requires A ' f <kfy f ,Y / P'Vwl <P 
ensures \' f <y f pf, Y / Vf=P 

minimises Yf N ( A /E/,y/) 

1: V/eF' : yf^max-jy/, [a/]} 

2 : n <r- YfVf-P 

3: if ?z>0 then 

4: y sub <- MAXCOSTDROP(n, c*, 2 / max ) 

V/ : Cf(y') := m-N(a f ,y f +y') 
with Vj/=N(., 1) : > y 

V/ : yf ax <- kf-yf n 
5: V/ : y f = k f — yf h 

6: return y. E/ N ( a /.2//) 

stops without solution. Case (ii) occurs if xQP with p ±+1 
has a solution whose over-allocation could not be fixed 
by DeAlloc due to the assignments while xQP with pf 
is infeasible. In our evaluations this case occurred more 
frequently with inputs having a very high system utilisation, 
£/ x f/j2f fc //H > 0.96 - in such cases, the minimal necessary 
allocation matches the limit. 

To realise bigger jumps between values for p', SEARCH 
examines p, ..,Pf as follows (pf is initially unknown): It starts 
with p and jumps down to p'-s—p— A (Line 17), skipping 
as many potential resource limits as resources were over¬ 
allocated. The idea is that for large over-allocations, a 
larger step (A) is necessary than for slight over-allocations 
to change the assignment. It continues jumping down, 
p'<—p'— A, until the new p' is infeasible and then tests 
increasing p', p'i—p'+l. The next values for p' might also be 
infeasible and increasing p' is continued until a solution for 
p' can be obtained, p=p\- SEARCH continues increasing p' 
and additionally skipping already considered p' until p'=p is 
reached and the search terminates without success. The first 
stage where p' jumps down allows to quickly find a small p' 
for which the assignment is appropriate. If not, the second 
stage ensures that allp' € \p^..,p] are tried. 

In our evaluations, only for inputs with very high system 
utilisation, testing many p' limits results in long solving time. 
The remaining inputs needed only 3-7 iterations and separate 
solving attempts of xQP (p '). This raises the question if the 
runtime of these multiple solving times can still compete 
with solving times of a single solving attempt for, e.g., cQP? 
An answer is provided by our evaluation in Section 6.2. 

Lemma 2. DeAlloc(p, A', k) (Algorithm 4) removes 
p resources at fully allocated facilities f&F' so 
that Y f N( a //m/, y/)=T is minimised while ensuring 

A /<2//M/,E/2//=P- 

Proof. Similar to Alloc, DeAlloc uses MaxCOSTDrop, 
but while ALLOC interprets dropped tokens as added 
resources, DeAlloc interprets dropped tokens as re¬ 
moved resources; this make adjustements and a new 
minimality deduction necessary. With n=0 no resources 
need to be removed Yfhf = P having one solution with 
T=Y f N (a, / , kf) minimal; done. With ?i>0, algorithm Max- 
CostDrop computes how many resources are removed at 
which facility. After removing the resources, T increases by 
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J2j N(a/, kf) — N(a/, kf—yf) = Ta; so minimising Ta will 
also minimise T. As MaxCostDrop maximises the token 
costs (Theorem 1), we need to flip them around (line (4)). 
This way, Ta is minimal and so is T; done. □ 


4.7 Linearise Erlang-C Formula 

This section describes our approach to linearise the Erlang-C 
function EC (2). Our algorithm based on Imamoto et al. [25] 
obtains basepoints with small linearisation accuracy [32]. 
This small error is achieved by successively changing the 
basepoints and estimating their error. To compute basepoint 
changes, our algorithm needs the first derivative of the 
function to be linearised. This section proposes an efficient, 
vectorised recursive functions to compute EC, its derivative, 
and Erlang-C-related functions fast. This reduces the runtime 
for obtaining the basepoints. 

Beside the efficiency needs, EC becomes not computable 
with a plain implementation 10 for k > 1 15 and a near k. 
For such a, k, the term a k or becomes large enough to 
overflow 11 and further processing is distorted. Alternatively, 
algebra systems such as Maxima resolves expressions symbol¬ 
ically allowing a greater accuracy paid by a longer runtime 
and memory usage 12 . A better alternative is proposed by 
Pasternack et al. [43]: The recursive function V (47) replaces 
the computationally challenging part. 

For an M/M/k-queuing system with arrival rate A, 
service rate [i, and k servers available, the probability of 
no jobs To jobs (48) can be rewritten as (49), a= x /n<k. As part 
of the Erlang-C formula (50), it can be rewritten similarly 
using V (51). Other queue systems' performance measures 
use EC. Examples are the expected number of requests in the 
system N(a, k) (52) or in queue 1X1*2 (A, /i, k) (53), the expected 
time of request spent in system T(A, //,, k ) (54) or in queue 
T Q (A, n, k) (55) [7], 


V(a, k) 

k~l j . 

i-k 

i= 0 



f 4 if fc=1 

“ jJ(V(a,fc-l)+l) else 

(47) 

Po jobs 

f ka k _1 

\ fc! (k-a) i\ ) 

(48) 


a k k o fe ^ 1 o i fc! j 

k\ k — a + k\ i\ a k 

i= 0 



=V(a,k) 



EC(a, k) = 


a k k + (k — a)V(a, k) 
k\ k — a 

k\ k — a 

a k k + (k — a)V(a, k) 

ka k p h 

k! (k-a) 0,obs 


(49) 

(50) 


10. Implementation in Python 2.7, Scipy 1.8.0, 64 bit-floats; computing 
EC(139,140) on our laptop lasts 16 ms. 

11. The number of floating-point bits determines the largest number 
which can be encoded, in our case ~ 1.81-10 308 with 64 bit. If an 
operation results in a value larger than this limit the result is replaced 
by a special value, inf, indicating the overflow. 

12. Computing EC(500,1000) with Maxima 5.26.0 took 0.6 s and 
consumed 80MiB. EC(1000, 2000) took 3.8 s and consumed 374 MiB. 


Algorithm 5 Vectorised function V(a, k) (47) 

1: function V (a, k) 

2: r ■*— f/a with same length as a and k 

3: for i = 2 ,.max{k} do 

4: mask-t— k > i 

5: r [mask] <- (r[mask] + 1) 

return r 

With array arithmetic a=ai, : element-wise operations 

a®b 44- Vi:ai®bi and masking d[l, 0,..., 0, l]=ai, a m . 


ka k 


k\ 


k — a 


k\ (k — a) a k k + (k — a)V(a, k ) 
k 


N(a, k) 


k + (k — a)V(a, k) 
= a + — EC(a, k) 


= a + 


k — a 
a 


— a + 

N Q (A ,y,k)= 


k — a k + (k — a)V(a, k) 
ak 

(.fe — a)k + (k — a) 2 V(a, k) 
A k 

Hk-X k+ n^V{^k) 

A k 


fc(M _A) + M^P V (A,fc) 
1 k 




T Q (A, M , k) = 


k 


fc(Mfc-A) + ^V(^) 


(51) 


(52) 


(53) 

(54) 

(55) 


Two technical performance improvements are worth 
mentioning: Pasternack et al. [43]'s recursive function is im¬ 
plemented as a loop that not only avoids time needed to push 
local variables on the stack for each recursive call but also 
evades the limited stack depth also limiting k. Implementing 
this improvement results in computation magnitudes faster 13 
than the other alternatives. The function EC with V becomes 
not computable for /c>100,000, sufficient enough for our 
scenario. In addition a vectorised version of V(a, k) allows 
computing several values V(ao, •••, a m ; ko ,..., k m )=to ,..., t m 
in a row (Algorithm 5). By masking the array fields, differ¬ 
ent loop iterations depths ko,k rn are processed in one 
loop (Line 5). This vectorises our linearisation algorithm and, 
thus, is much faster than a simple implementation. 

Finally, the first derivation of N is needed. The basepoints 
are obtained for parameter a with fix k, Nfc(a). As a result, 
Nfc(a) is differentiated only for a; details in Section 8.1. The 
first derivative ^Nfe(a) (58) is rewritten with V (59). 


5 Heuristic 

This section discusses an adaption of the most related 
(Section 2) heuristic proposed by Aboolian et al. [1], While 
their work also combines the Facility Location Problem with 
M/M/k-queuing systems at each facility, their problem QP A 
differs from QP in three points: First, QP A minimises the 
maximal response time whereas QP minimises the average 
response time. Second, assignments in QP A are predefined 
whereas QP also decides the assignments. Third, QP A has 

13. Computing EC(139,140) via V took 0.6 ms (Python 2.7). 
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Algorithm 6 Combines local solutions to find new ones, 
l: function Genetic(G , p) 

2: /* Create a random population of solutions */ 

3: P 4- 0, l 4- [f\F\\ 

4: while not enough solutions in P do 

5: F s 4— l random facilities from F 

6: F s ,yf,t 4 DESCENT(G, F s , p, F, 0) 

7: if solution not found then increase l 

8: else 

9: if F S ^LP then add (F s , yf, t) to P 

10: while not enough merge steps are done do 

11: /* merge two solutions */ 

12: F s , Fj -f— two random F s € P 

13: Fu 4— F s UFj; Fi -4- F s CiF' s ; 

14: Fm 4— three random f G F \ F\j / / Mutation 

15: F d 4- (F v \ Fj) U F m 

16: Fn 4— Fi with one /gF d added / / Mutation 

17: F a ,y f ,t4- DESCENT(G , ,F N ,p,F D ,F I ) 

18: if solution found A F S ^P At < largest t in P then 

19: replace worst solution in P with current 

20: return F s ,yf,t from P with smallest t 


no resource limit per facility, which QP has. These three 
differences necessitate adjustments to QP A 's heuristic. 

The resulting heuristic H consists of four major parts. 
ALLOC (Algorithm 1) computes the optimal allocation for a 
given assignment to a subset of facilities F S CF. SOLVE (Algo¬ 
rithm 8) first assigns requests to the closest facilities in a given 
facility subset F s and computes the corresponding allocation 
with Alloc. Solve is used by Descent (Algorithm 7), 
which iteratively varies the facility subset to find better 
subsets. These variations are limited by two additional 
facility subsets F 1 and F D , so that only a local minimum is 
found. To find new local minima. Genetic (Algorithm 6) 
randomly combines already found solutions. 

The meta-heuristic GENETIC maintains a finite set of 
currently best solutions P. At first, an initial set of solutions 
is randomly generated (Line 3-Line 9). This is influenced by 
two factors: The number of maintained solutions, \P\, is pre¬ 
defined; maintaining many solutions increases the chance to 
find different maxima but also increases runtime. The size of 
facility subsets l starts from \/\F\ as in the original heuristic; 
if no solution is found with these facilities, I is increased. An 
initial solution is generated by invoking DESCENT (Line 6) 
with l randomly chosen facilities. DESCENT returns a new 
subset F s , potentially different from the chosen facilities, the 
allocation yf, and corresponding average response time t as 
the measure of solution quality. 

In Genetic's major part (Line 10-Line 19), two random 
solutions from P are combined to find new facility subsets. 
Three such new subsets are computed: The intersection F\ of 
both solutions' subsets F s , Fj is part of the new offspring 
facility set F^. The domain Fd is the union of F,, Fj with 
three random new facilities Fm; Fd; it limits the following 
explorations by DESCENT. The offspring Fm construction is 
the same as in the original heuristic. A local solution found 
by DESCENT replaces the worst solution in P if the newly 
found solution is better than the replaced one. After a finite 
number of merge operations, the best solution found so far 
is returned. 


Algorithm 7 Refines solution towards local optimum 

l: function Descent(G, F s , p, F D , F,) 

2: F s min , yf n , t min 4- F s , SOLVE (Fj, p) 

3: while smaller /, mm was found do 

4: for Fj G NElGH(Fj nin , F D , F{) do 

5: y'f, t’ Solve(G, Fj, p) 

6: if t’ < f min then 

r 7 . z^min -.min j-min , rv „/ xf 

'• r s 5 y f i ^ r s'>yf’> L 

8: return Fj 1 ™, yf', t 1 ™ 1 


Algorithm 8 Approximate assignment and optimal allocation 

l: function Solve(G, F s , p) 

2: Vc, /gGxF : X c f 4— 0 

3: for c, /gGxF s sorted by increasing l c f do 

4: dem 4— A c - Jfj, x cf > 

5: Cap G- kfPf-^Xcff 

6: if dem > 0 A cap > 0 then 

7: x c f A- min{ dem , cap } 

8: return Alloc(G, V/ : x c f, p) 


DESCENT starts with a given facility subset F s and 
searches so-called neighbouring subsets for better solutions. 
Aboolian et al. define F s ’ s neighbourhood (Neigh) as a set 
of facility subsets constructed from F s by a) removing one 
facility , b) adding one facility, or c) doing both (56). 

NEIGH(F S , F d , F : ) = {F a U{/} | V/gF d \ F s } (a) 

n{F s \{/} |V/GF S \F,} (b) 

n {F s u{/}\{/'} I v/gf d \ f„ vfeF, \ F} (c) 

with F] C F s C F d C F (56) 

If at least one better solution is found in the subset neigh¬ 
bourhood, the search continues for the best found solution. 
The algorithm hence descends towards a local minimum. 

The facility sets F D , F\ restrict the cardinality of Neigh 
and thus the number of instances solved by SOLVE. This is 
done by ensuring that F : - the intersection of the two parents 
- is always part of F s and that Fd - the modified union of 
the two parents - is the subset of F to which F s can grow. 

SOLVE (Algorithm 8) computes the assignment and 
allocation for a given facility subset F s using p resources. 
The first part assigns the demand A c to the closest facilities. 
This assignment is more complex than QP A 's assignments 
which neither considers resource capacities (/; f) nor limits 
(k f ). When a facility's resources are exhausted but demand 
still exists, this demand is served by another facility. What 
was a simple binary assignment to the nearest facility in 
Aboolian's problem (QP A ) becomes an optimisation problem 
to minimise the total assignment distances (i.e., min^, l c fX c f ) 
under capacity and serve-all-demand constraints. To solve 
this problem efficiently, the assignment is computed by the 
greedy heuristic in Line 2-7: Demand is served by the nearest 
facility / with remaining capacity. To do so, c, / pairs with 
the shortest distance are handled first. If demand remains to 
be distributed (i.e., A c x c y >0) and facility / has enough 
capacity left (p,fkf— Jf c , x c >f > 0) then as much demand as 
possible (A) is assigned from c to /. This way, the demand 
is moved to free facilities in a way similar to a multi-source 
breath-first search. 
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Table 3: Investigated values for J and to. 


J'S shorthand 

J = n 

n 

m 

klOO 

1,..., 100 

100 

30 

fib 

1,2,3,5,8,13,21,34,55,89,100 

11 

15 

2 i 

1,2,4,8,16,32,64,100 

8 

8 

3 i 

1,3,9,27,81,100 

6 

6 

4* 

1,4,16,64,100 

5 

4 

k3 

1,50,100 

3 



While this heuristic is in most cases sufficient, it does 
not compute the total average RTT in all cases. Let us 
consider two clients c, c! with demand, A=l, and facilities 
/, /' with one capacity, /i=l, having the following latencies 
l c f= 1, l c fi=l c ff= 2, l c 'f> =4. The assignment heuristic (Line 2- 
7) computes x c f=x c >f = 1. First the c, / pair is visited, then 
c'f is not possible as capacity is exceeded, and finally c'f 
is paired. These assignments result in total RTT 5 but the 
optimum is 4 with assignment x c f=x c >f>=l. 

One way to handle this case, is to check for total distance 
reduction when swapping assignments in a post-processing. 
The swapping procedure is very time consuming as consider¬ 
ing all pairs only once is not sufficient to obtain the optimal 
solution. The described special case rarely occurred in our 
test instances and we had favoured short runtime, so that we 
omitted such a post-processing. 

The presented assignment heuristic applies an extended 
nearest-facility assignment rule and, hence, is more restrictive 
than QP. We tested an alternative implementation for SOLVE: 
Assignment and allocation are obtained by our fastest 
optimisation problem qQP for facility subset F s . However, 
SOLVE as the basic function of H is called very often and 
the runtimes were unsatisfactory longer than the presented 
implementation, much longer than any other presented 
approach to solve QP discussed in Section 6. Because of 
this, we entirely skipped this variant. 


6 Evaluation 

6.1 Obtaining the Basepoints 

The linearised formulations' quality and solving time depend 
on the choice of basepoints. For all optimization problems 
above, more basepoints improve the linearisation accuracy 
and improve solution quality. But more basepoints also 
increase the search space and extend solving time. 

Four control factors influences the accuracy: a) The 
number of basepoints m for one curve N j (a); b) the number 
of curves/allocated resources n, 1 <n<kf, c) the linearisation 
interval's upper bounds for a and j; d) the basepoint 
positions themselves. 

For factor (c), we allow 0<a<0.98y similar to our previ¬ 
ous work [32] and l<j<100. Setting the maximal number 
of available resources V/ : Ay=100 is large enough for 
our evaluation. Our findings can be applied for larger 
values of kf that decrease either the linearisation accuracy 
(for fixed to) or the number of basepoints m (for fixed 
accuracy) [32]. For factor (c), our algorithm [32] is applied 
obtaining basepoint sets with high linearisation accuracy. 
For factors (a) and (b), different configurations for to and 
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Figure 9: Surface linearisation accuracy e v (57). 

J are investigated. Table 3 shows these configurations. For 
example, J=[l,50,100] means n = 3 curves with y= 1, 5, 
or 100 are considered, indicated by the shorthand k3. For 
each list J, the corresponding Nj(a), j£ J are approximated 
by m = 4, 6, 8, 15, or 30 basepoints. The table describes 30 
different configurations (J, m). 

The concrete values for to result from the following 
thoughts: When adding a basepoint to a PWL function 
(for one curve), accuracy improvement has diminishing 
returns [32], e.g., adding a basepoint to 4 basepoints helps 
more than adding one to 8 basepoints. So, many low values 
for m cover the interval where accuracy changes significantly 
and a few large values provide hints towards asymptotic 
behaviour. The cases for to = 2 or 3 basepoints are dropped; 
m= 2 results in one line segment and with m= 3, having 
two line segments, approximating our convex function still 
results in low accuracy. 

The concrete values for J result from the following 
thoughts: As discussed in Section 4.3, dropping larger j£j 
from J results in less inaccuracy than dropping small j, 
so we choose sequences like a n =2 n favouring many small 
values over few large values. Additionally, J=fcl00 is added 
as a baseline comparison using all curves. 

Figure 9 shows the error e A - defined in equation (57) for 
each (to, J) configuration (Table 2) plotted as a function of 
the number of basepoints nm; its labels encode the different 
configurations for to, J (using the above shorthand for J). 
Two symbols correspond to the two triangle orientations (Fig¬ 
ure 6) used to define the approximating surface S. 

Cjv := max a ,k \ N(a, k) - N(a , k) 

N is paremeterised by (to, J, triangle orientation + /—) (57) 

Figure 9 also illustrates the trade-off between two conflicting 
metrics: Few basepoints nm, shown on the left, and low error 
e fj, shown on the right. Pareto-optimal solutions (for each 
triangle orientation) are shown grey-shaded. 

For our evaluation we choose the basepoint sets as 
follows: Since many basepoints nm increase the search space 
size, large nm >330 were skipped; the lowest error of these 
skipped configurations is only marginally smaller than the 
lowest error of the remaining configurations. The remaining 
nm values were split in three partitions. In each partition, one 
configuration was chosen with the lowest error e^ satisfying 
the Pareto property: (to, J)=(15, 2 l ), (8,3*), (6,4*). In addi¬ 
tion, configuration (8,klOO) from the baseline basepoint sets 
(klOO) was added. The number of basepoints m=8 provides 
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a good trade-off between low error and few basepoints, e.g., 
with 25% fewer basepoints the error nearly doubles (6,kl00) 
and with 88% more basepoints the error is only divided by 
four (15,kl00). 

6.2 Approach Comparison 

The main question is which presented approach (one of 
optimisation formulations or heuristic) solves QP best, 
where best is described by two conflicting metrics: Quality 
(accuracy) and solving time. We consider examples in which 
we vary four factors: Topology G, demand distribution 1), 
basepoint set B, and resource limit p. 

Candidate topologies were collected from different 
sources: sndlib 14 [40], topology zootable 4 [33], and king- 
trace 15 [22]. The topology sources offer 534 candidate 
topologies from which 8 were selected (Table 4) by three 
properties: First, the number of nodes increases the problem 
size quadratically. We selected three small (marked as n- 
) (with at least 20 nodes), three medium (n/), and one 16 
large (n+) topologies. Second, the round trip times con¬ 
tribute to the objective function. The selected topologies 
have low (1-), medium (1/), or high (1+) average round trip 
times, 0K£T= 1 /\e\^ jvv ,I vv i. Third, resource distributions 
will be degree-dependent, e.g., few resources on poorly 
connected nodes and many resources on well connected 
nodes. Two topologies were explicitly selected (marked by d) 
whose quartiles of node degrees differed from the other 
topologies (Table 4). 

The available resources, in total kf=5\N\, are as¬ 
signed to nodes weighted by degree. All resources are 
homogeneous with /i=100 rec i/s=/z/, V/. 

The second factor D describes how the individual Poisson 
processes' arrival rates A c are assigned to nodes c. For each 
evaluation run, we choose all A c randomly but keep them 
fixed within one run. Averaged over all runs, the expected 
rate is A and all A c are distributed according to one of three 
different distributions: a) Gaussian normal distribution with 
small standard deviation, D=N(X, a / 2 o)=Ni b) Gaussian 
normal distribution with large variance D=N(A, A)=N 2 , c) 
and exponential distribution D =Exp (A) with even stronger 
variations to reflect local hot spots. We ignore nodes with 
negative arrival rates, hence the random variables A c are 
defined as follows: Vc:A c = max{0, A'} with X ~l). 17 

The mean arrival rate is A =°- 98 /2 This way, on 

average half the available resources are needed to handle the 
demand. This is the typical scenario where taking both round 
trip time and queuing delay pays off - if almost all resources 
are needed anyway, there is no freedom of choice; if few 
resources are needed, queueing delay is unimportant and the 
problem degenerates into a simple RTT optimization problem. 

14. Round trip times were approximated by geographical dis¬ 
tances [29]. Nodes without geolocation positions were removed and 
their neighbours were directly connected. 

15. In kingtrace, a sparse matrix specifies point-to-point latencies. 
Some latencies were only available in one direction. We assume the 
same latency for the opposite direction; otherwise those nodes would 
have to be discarded. 

16. Only one of three large topologies were solved within the time 
limit, see extended version [31]. 

17. This slightly skews the expected values, but for the concrete 
scenarios, this effect was negligible. 


Technically, the reduced factor 0.98 instead of 1 reflects the 
linearisation bound a m =0.98k (factor (d) in Section 6.1) and 
ensures that all queuing systems are in steady state. 

The third factor is the resource limit p= It influ¬ 

ences the average resource utilisation and, hence, the average 
queuing delay. The higher the resource limit, the lower 
the resource utilisation. Since the total number of available 
resources (X! f &/) differs among the topology sizes (Table 4), 
the resource limit is cast as a function of this total number of 
resources, p= [a J2fkf 1; f° r values a<0.5 the input becomes 
infeasible due to the selection of A, where half of the resources 
are needed to handle the demand. For the evaluation, factor 
p uses a=0.5625; 0.625; 0.6875; 0.75; 0.8125; 0.875; 0.9375; 
where a = 0.56 allows slightly more resources than needed 
for the demand and a = 0.94 allows using nearly all 
resources. 

While the first three factors vary the topology, the fourth 
factor basepoint set B varies the linearised problems. From 
Section 6.1, B is (to, J)e{(15, 2 1 ), (8, 3*), (6,4*), (8, fclOO)}. 

Putting all factors together, 168 factor combinations ( G , 
D, p) are considered and for each one, 50 random demand 
realisations are generated. This results in 8400 different config¬ 
urations to be solved by either the optimization solver or the 
heuristic. The linearised problems (cQP, tQP_, tQP + , qQP) 
use 3 basepoint sets _B={(15,2*), (8, 3*), (6,4*)}. In addition, 
cQP is solved with basepoint set (8, fclOO) (Section 6.1) and 
serves as the baseline comparison; this case considers all 
curves and, hence, has the highest linearisation accuracy and 
the best solution quality. The randomised heuristic solves 
each configuration up to 15 times to obtain statistically 
meaningful results. But for medium or large topologies 
the heuristic did not compute a solution in reasonable 
time because after 6 hours it still builds up its initial 
population. As an optimisation solver, Gurobi 18 is used 
and is configured to stop solving after one hour 19 . Especially 
for larger topologies, this often causes optimality gaps up to 
20 %. 

The remaining section first discusses example results (Sec¬ 
tion 6.2.1) and then presents aggregated statistics revealing 
the core findings. 

6.2.1 Detailed Results 

Figure 10a shows two plots with either the solution quality 
as the average response time and the corresponding average 
solving time with 95%—confidence intervals. They compare 
all 14 solving approaches, represented by different symbols, 
for different utilisation levels p. Figure 10a shows results for 
medium topology Colt with exponential demand distribu¬ 
tion. Below, Figure 10b shows results for the same topology 
but with normal demand distribution Ni. At last, Figure 10c 
shows results for smaller topology Pioro with demand 
distribution N 2 . 

In the shown plots, the baseline approach cQP 8fcl0 o 
computes the best solution (smallest avg. resp. time) among 
the other approaches but is the slowest approach, except for 
small topologies (e.g. Figure 10c) where other approaches 
are significant slower. Section 6.2.2 describes how cQP 8fcl0 o 
performs among the other topologies. 

18. Gurobi version 5.6.3, Python version 2.7.9 

19. Hardware used: 2 of 6 Xeon X5650 Cores, 2.7 GHz, 4 GB RAM 
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(a) For G = Colt, D = Exp 
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Table 4: Topology overview. 


N 3-ITl6source 

G 

Mark 

Size 

\N\ 

Density 

2I jv I/|b|(|b|-i) 

0RTT 

[ms] 

diarriRTi 

[ms] 

Degree Perc. 

[ min 25 % median 75 % max ] 

Columbus zoo 

d 

31 

0.211 

326.5 

1203.8 

[ 1.0 3.0 4.0 11.0 12.0] 

Cesnet20 zoo 

n-1- 

38 

0.064 

189.5 

387.3 

[1.0 1.0 1.0 3.0 15.0] 

giul39 sod 

d 

39 

0.116 

447.9 

852.1 

[ 3.0 3.0 4.0 5.0 8.0 ] 

pioro40 snd 

n- 1+ 

40 

0.114 

545.6 

1140.8 

[ 4.0 4.0 4.0 5.0 5.0 ] 

Colt zoo 

n/ 1/ 

149 

0.016 

546.8 

1338.5 

[ 1.0 1.0 2.0 3.0 18.0 ] 

UsCarrie zoo 

n/ 1/ 

152 

0.016 

732.7 

2076.6 

[ 1.0 2.0 2.0 3.0 6.0 ] 

Cogentco zoo 

n/ 1/ 

186 

0.014 

865.9 

2260.1 

[ 1.0 2.0 2.0 3.0 9.0 ] 

Kdl zoo 

n+ 1+ 

726 

0.003 

1476.4 

4317.1 

[ 1.0 2.0 2.0 3.0 10.0 ] 

intercon* zoo 

n+ 1/ 

1108 

0.002 

595.8 

1758.4 

[ 1.0 1.0 2.0 3.0 54.0 ] 

kingtrace* 

n+1- 

1819 

0.180 

47.9 

726.4 

[ 3.0 220.0 242.0 467.0 559.0 ] 


(*) Marked topologies were too large to be solved efficiently. 
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Figure 11: Comparing the quality of baseline approach 
cQP 8j fcioo with the other approaches as a CDF plot. 

The different formulations cQP.,, tQP_ y , tQP y , and 
qQPj are grouped together and the three basepoint sets 
j G {(6,4*), (8,3*), (15,2*)} form three groups distin¬ 
guished in Figure lOa-c by different grey levels. Among 
these four approaches, the thinned curve formulation cQP ; 
is the simplest and the fastest in each group, sometimes 
magnitudes faster, while the quality is a bit worse than the 
other approaches. When comparing the surface linearisations, 
the quadrilateral-based formulation qQP is as good as the 
other two formulations but is solved faster in some cases. 
Section 6.2.3 describes how the surface linearisations perform 
among all configurations. 

Finally, the last solving approach uses the genetic algo¬ 
rithm as an heuristic. For medium and large topology, the 
heuristics's solving time exceed a maximum runtime of six 
hours in most cases; for those cases no measurements are 
available. Section 6.2.5 describes how the heuristic performs 
among all configurations. 

6.2.2 Baseline Algorithm 

The approach cQP 8 fc i 0 o is the baseline formulation. We 
compare solution alternatives to the baseline by computing 
the ratio of the alternative's quality to the baseline's quality 
individually for each of the 8400 configurations (ratio > 1 
means the baseline algorithm performs better). As this 
produces many individual results, we group similar solution 
alternatives together (see below for details). The ratios 
in these groups are then jointly described by empirical 
cumulative density functions (ECDFs), e.g. Figure 11. 


We identify five groups of similar solution alternatives. 
The first group contains all thinned curves approaches 
(cQP 6)4 i,cQP 8i 3 i,cQP 15) 2 i); for this group, 99.4% of all con¬ 
figurations are solved better by the baseline approach. For 
the remaining configurations, the lowest ratio is 0.97; the 
better solutions are caused by ALLOC which uses the exact 
queuing delay function. 

The second, third, and fourth group contain all surface 
linearisations with basepoint sets B = (6,4*), (8,3*), and 
(15,2*). For these groups, the configuration are solved 
similarly well as with the baseline approach. Surprisingly, 
26%, 51%, and 63% of all configurations result in ratios 
below 1, with the lowest ratios being 0.95, 0.96, and 0.96. 
This is caused by algorithm ALLOC called by the post¬ 
processing Search. 

The fifth group contains all heuristic solutions, which are 
very good for small topologies but for the other topologies 
the quality was magnitudes worse. Section 6.2.5 has more 
details about the heuristic. 

To summarize, solutions obtained by cQP 8 kl00 are good 
references for the expected solution quality and solutions obtained 
by tQP, qQP are similarly good. Better solution qualities than 
the baseline algorithm's quality involved using ALLOC; the 
difference to the baseline is always small and results from 
using the exact instead of the linearised queuing delay. The 
difference relates to the linearisation accuracy. 

6.2.3 Thinned Curves vs. Surface Linearisations 
One of the goals of this paper was to compare univariate 
vs. bivariate linearisations of the time-in-system function. 
To do so, we compare here the quality ratios obtained from 
either thinned curve linearisation (Section 4.3) or surface 
approximations (Sections 4.4 and 4.5). For the comparison, 
we fix the basepoint sets j € {(6,4*), (8,3*), (15,2*)}. For 
each basepoint set, we compute the quality ratio by dividing 
the solution quality of tQP +J , tQP_ y , or qQP y by that of 
cQPj. The resulting ratios again give raise to three ECDFs, 
one per basepoint set. We do the same thing for the solving 
times, obtaining three more ECDFs. 

Figure 12 shows ECDFs of these quality and time ratios 
for each of the basepoint sets. Most (92.6%, 90%, 86.7%) 
quality ratios were smaller than 1. However, solving con¬ 
figurations with surface-base approaches takes magnitudes 
longer than with cQP y ; 32%, 48%, and 70% of the surface 
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Figure 12: Comparing the quality and solving time ratios of 
cQP j with tQP* j and qQP^. 



Figure 13: Comparing the quality and solving time ratios of 
cQP j with tQP* j and qQP^. 

approaches took 100 times longer than the thinned curves 
approach. 

This longer solving time has two causes: First, cQP ; 
is much simpler and has a fast post-processing (Alloc) 
while tQPor qQP are invoked multiple times by SEARCH. A 
better but more complex post-processing, adjusting an over¬ 
provisioned solution obtained by tQPor qQP, could reduce 
their solving times; this is for further study. Second, Gurobi 
stopped improving the solution after one hour; so without 
this limit, the time factor likely increases. To conclude, among 
the linearised problems, the thinned curve approach cQP showed 
good quality with the shortest solving time in most of the cases. 

6.2.4 Triangle i/s. Qudrilateral Surfaces 
Similar to the comparison of curves vs. surfaces, we are 
interested in characterizing the behaviour of the different 
surface approaches. To this end, we compute quality and 
solving time ratios of the triangles divided by the quadrilat¬ 
eral approaches (Figure 13). 

For each group, in 65%, 72%, and 62% of all configura¬ 
tions, the quadrilateral approach qQP is faster than the other 
formulations. But in 27%, 18%, and 17% of all configurations, 
qQP is 10% slower than the other formulations. For each 
group, 87%, 87%, and 88% of all configurations are solved 
equally good or better with qQP than with the other formu¬ 
lations and in the remaining configurations qQP is worse 
than the other formulations. This confirms the structural 
arguments for the similarity between these two approaches. 
The first SEARCH iteration results in different over-utilised 
solutions' when using qQP or tQP*. Then, SEARCH continues 
solving problems with different limit p and that results in 
different solutions. To conclude, the problem qQP embedded in 
algorithm SEARCH obtains in most cases a very good solution in 
less time than the triangle-base problems. 


Table 5: Obtained solutions of all configurations with heuris¬ 
tic. 


Name, G 

Columbus 

Cesnet20 

guil39 

pioro 

Size, \N\ 

31 

38 

39 

40 

#Sol. % 

30 

39.7 

40 

89.3 

Name, G 

Colt 

UsCarrier 

Cogento 

Kdl 

Size, \N\ 

149 

152 

186 

726 

#Sol. % 

0.2 

0 

0.7 

0 


In addition, we compared solutions of tQP* and qQP of a 
subset of 200 configurations. Doing so, we could support two 
arguments present previously: (a) The triangle orientation 
influences the solution quality - the triangle orientation is 
the only difference between tQP_ and tQP + , where tQP 's 
solution quality were always smaller /better than tQP + 's 
solution quality, (b) Using quadrilaterals instead of triangles 
is reasonable - the solution and solution quality for tQP_ 
and qQP matched in all configurations. 

6.2.5 Heuristic 

The heuristic has a long solving time 20 , so a six-hour time 
limit was imposed; solving times beyond this limit are 
entirely impracticable for our scenario. Each configuration 
was solved 15 times to average out random variations in 
the heuristic Genetic itself. Some of these solving attempts 
were successful while others failed for the same configuration. 
Table 5 shows how many attempts were successful, grouped 
by topologies. 

In total, 85.1% of all attempts were not solved in time 
and most of them do not advance beyond Genetic's initial 
phase. The size of the topology corresponds directly to a large 
neighbourhood visited several times in DECENT (Section 5). 
This causes the dramatically long solving time. Improve¬ 
ments are possible, such as caching a history of DECENT calls 
avoiding visiting same neighbourhoods again, but a first 
implementation showed high memory demand of the cache 
and high runtime due to many cache checks. Concluding, 
using the genetic heuristic (in its current version) is impractical 
with its significantly worse quality and solving time. 

6.3 Scenario Variants 

This section investigates how application performance and 
resource distribution influence the average response times. 
The same setup as in the previous section was used but with 
different values for the compared factors focusing on the 
new investigation. 

For the application performance, we consider short, 
medium, and long request processing times corresponding to 
high, medium, and low service rates; /t=1; 100; 10.000 re q/s. 
The same total number of resources Jffkf=5\N\ were 
geographically distributed in five different ways (factor S). 
(a) 5=d5: \N\ available resources are assigned to the 5 nodes 
with largest degrees 21 ; (b) S= d: resources are distributed 

20. The heuristic was implemented in Python and utilised NumPy to 
speed up computations. In contrast, Gurobi is highly optimised. This 
biases the solving time comparison a bit, but the results reveal that even 
a highly optimised heuristic will be outperformed by Gurobi for large 
topologies. 

21. For all degree-based selections, the node ID was the tiebreaker. 
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Figure 15: Comparison of quality factors for three S against 

S=p 

across all nodes, weighted by degree (this was used in 
our previous work); (c) S d 2 : node degrees are squared, 
amplifying the effect of having more resources at better 
connected nodes, (d) S=c: As a baseline case, all available 
resources are placed at a single node having the lowest total 
latency to all other nodes,. This baseline case minimises 
the average queuing delay, (e) S=x: All nodes have 100 
available resources, eliminating effects of capacity limits (kf) 
and acting as an upper quality bound. 

The demand distribution follows only two mathemat¬ 
ical distributions, -Dc{N 2 ,Exp}. The resource limit p is 
restricted to low and high facility utilisation, p=|~5a|iV|], 
a=0.51; 0.6; 0.75. The same topologies are considered as in 
the previous section (Table 4). The resulting 720 factor com¬ 
binations each have 50 realisations of demand distribution, 
resulting in 36,000 configurations. These configurations are 
solved by qQP with basepoint set B=(6 1 4*). 

At first, the configurations are grouped into combinations 
of factors (p,,p, D). Within one group, the resource distribu¬ 
tion factors S are compared. For all groups, configurations 
with 100 resources everywhere (S=x) have, as expected, the 
shortest round trip times and response times. Configurations 
with resources at a single site ( S=c ) have, as expected, the 
longest round trip times and the shortest queuing delays in 
all groups. In addition, for some groups this factor results 
in the shortest response time showing that queuing delay 
drop compensates the longest round trip time. Consequently, 
using resources at multiple sites does not always have a shorter 
response times than a single-site resource allocation. Which factor 
S results in the second-shortest response time depends on 
the topology G but was independent of ft, p. D. All plots 
in Figure 14 compares average round trip times, response 
times, and queuing delay (the different between response 
times and round trip times) as a function of resource limit p, 
service rate ft, demand distribution D for the five groups of 
resource distributions S. 

How much a distributed deployment reduces the re¬ 
sponse time compared to a single-site deployment, the 
quality of different resource distributions S= d, d2, 5d 
are compared against S=c by computing quality factors. 
Figure 15 shows ECDFs for quality factors for each resource 
distribution. For these resource distributions, 78%, 87%, and 
97% of the configurations yield better response times than 
S=c (the quality ratio being smaller than 1) and 61%, 61%, 
35% have half or shorter response times (the quality ratios 


being smaller than 0.5). In conclusion, deploying application 
across multiple sites can (at least) halve response times. 


7 Conclusion 

This paper investigated the problem of allocating resources 
at multiple sites in order to minimise the user-perceived 
response time. Such a distributed deployment reduces 
response time by half or more compared to a single-site 
deployment (Section 6.3). Five different formulations of 
optimisation problems were presented, trading off quality 
against solving time. One of these techniques - thinned 
curves - seems particularly attractive as its solving times are 
vastly superior at only marginally reduced service response 
times. This technique, however, is somewhat sensitive to an 
improper choice of basepoints; the surface techniques are 
much more robust against a small number of basepoints. 

From the considered factors, the topology has - as 
expected - a strong influence on the optimal solution and its 
quality. Also, severely limited compute resources make the 
problem hard to solve well; the common wisdom of queuing 
theory to provide ample spare capacity is reinforced here in a 
distributed setting. Moreover, our results indicate that jointly 
considering queuing delay and RTT is indeed crucial when 
these two times are roughly of the same order of magnitude 
- otherwise, when one of the two times dominates, simpler 
optimisation models suffice to obtain good solutions. In 
fact, we found scenarios where single-site deployments 
outperformed distributed deployments. 

The presented formulation techniques are not limited to 
the paper's problem. Beyond this paper, any optimisation 
problem having a univariate or bivariate, non-linear cost 
function can be linearised by the presented approaches. 
We extended known approaches to mixed-integer objective 
functions which have not been treated in the literature so far. 

Our formulations are not restricted to convex/concave 
cost functions. In particular, a newly presented surface 
linearisation based on quadrilaterals instead of commonly 
used triangles turned out to be a promising alternative that 
is sometimes faster than the triangle-based formulations and 
has the same solution quality (Section 6.2.3). 

We introduced three greedy algorithms MaxWeight, 
Alloc, Dealloc for allocation problems where n tokens 
are placed in m buckets so that the costs are minimised. If 
the cost function c(j ) is decreasing in the number of buckets 
and convex, these algorithms are optimal. 

An advanced queuing system approximating a facility, 
data centre, or rack of compute nodes, (heterogeneous 
resources) could improve accuracy; but by how much is 
the question? 

The following unproven thought could improve all 
approaches using the algorithm ALLOC: When calling ALLOC 
without initial allocation the result is equal to but most likely 
better as providing an initial allocation as proposed in this 
paper. Even if the provided initial allocation is optimal for the 
used problem linearisation, the optimal allocation of ALLOC 
might performs better. Similarly, calling DEALLOC can be 
replaced by calling ALLOC without initial allocation instead. 

In summary, this paper not only improves service re¬ 
sponse times by optimally allocating resources but also 
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presents optimisation techniques and greedy algorithms 
applicable beyond this scenario. 
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8 Appendix 

8.1 N fc (a) Derivative and Transformation 

This sections provide the transformation of first derivative N of a with fix k into a term using recursive V(a, k) (47). 
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